Skip to main content

AWS Databricks Integration Setup

Complete guide for integrating AWS Databricks billing data with the nOps platform.


Prerequisites

  • AWS Account: Administrative access to deploy CloudFormation stacks
  • S3 Bucket: Existing bucket or ability to create one for billing data storage
  • AWS Databricks Workspace: Administrative access to create and schedule jobs
  • IAM Permissions: Ability to create and manage IAM roles and policies

Video Tutorial


How It Works

The AWS Databricks integration follows this process:

  1. S3 Bucket Setup - Configure S3 bucket for billing data storage
  2. CloudFormation Deployment - Deploy nOps-provided stack for secure access permissions
  3. Databricks Job Creation - Schedule daily job to export billing data to S3
  4. Automated Data Collection - nOps retrieves and processes data for Cost Analysis
note

Ensure you're logged into the correct AWS account throughout the setup process.


Setup Instructions

Step 1: Access nOps Integrations

  1. Navigate to Organization SettingsIntegrationsInform
  2. Select Databricks from the available integrations

Inform Integrations Interface

Step 2: Configure S3 Bucket

Integrate Databricks with Inform

  1. Select AWS Account from the dropdown
  2. Enter Bucket Details:
    • Bucket Name: Your S3 bucket name
    • Prefix: Unique prefix (e.g., nops/) for file organization
  3. Click Setup to save configuration
important

If you don't have an S3 bucket configured for Databricks, follow the S3 Bucket Setup Guide first.

Step 3: Deploy CloudFormation Stack

  1. You'll be redirected to AWS CloudFormation with pre-filled parameters
  2. Check the acknowledgment box for IAM resource creation
  3. Click Create Stack

User Acknowledgement of Automated Resource Creation

note

Ensure the stack deploys successfully - this is crucial for nOps data access.

Step 4: Create Databricks Export Job

After successful CloudFormation deployment, you'll see the integration overview:

Overview of the Databricks Integration

  1. Generate Script:
    • Click Generate Script button
    • Copy the script to your clipboard

Databricks Billing Extraction

  1. Create Databricks Notebook:

    • Log in to your Databricks workspace
    • Navigate to WorkspaceCreateNotebook
    • Name it (e.g., NopsDatabricksBillingDataUploader)
    • Paste the copied script
  2. Schedule the Job:

    • Click Schedule in the notebook toolbar
    • Configure the schedule settings:
      • Set frequency to Every 1 day
      • Select appropriate compute cluster
      • Click More options+Add
      • Enter role_arn as key and the role ARN as value
    • Click Create to finalize the schedule

    Scheduled Execution of Databricks Notebooks

  3. Configure Job Parameters:

    • In the Parameters section, add the role ARN parameter:
    • Enter role_arn in the key field and the actual role ARN in the value field

    Dependencies of Scheduled Jobs in Databricks


Next Steps

  • Monitor Integration: Data will appear in Cost Analysis within 24 hours
  • Optimize Usage: Use nOps Cost Analysis tools to identify optimization opportunities
  • Set Alerts: Configure cost alerts and notifications for your Databricks usage

For general questions about Databricks integrations, see the main Databricks Exports page.